Longest path problem

In the mathematical discipline of graph theory, the longest path problem is the problem of finding a simple path of maximum length in a given graph. A path is called simple if it does not have any repeated vertices. Unlike the shortest path problem, which asks for the shortest path between two vertices and can be solved in polynomial time in graphs without negative-weight cycles, the decision version of this problem is NP-complete, which means that the optimal solution cannot be found in polynomial time unless P = NP.

The standard decision version of this problem asks whether the graph contains a simple path of length greater than or equal to k, where the length of a path is defined to be the number of edges along the path.

Contents

NP-completeness

The NP-completeness of the decision problem can be shown using a reduction from the Hamiltonian path problem. Clearly, if a certain general graph has a Hamiltonian path, this Hamiltonian path is the longest path possible, as it traverses all possible vertices. To solve the Hamiltonian path problem using an algorithm for the longest path problem, we use the algorithm for the longest path problem on the same input graph and set k=|V|-1, where |V| is the number of vertices in the graph.

If there is a Hamiltonian path in the graph, then the algorithm will return yes, since the Hamiltonian path has length equal to |V|-1. Conversely, if the algorithm outputs yes, there is a simple path of length |V|-1 in the graph. Since it is a simple path of length |V|-1, it is a path that visits all the vertices of the graph without repetition, and this is a Hamiltonian path by definition.

Since the Hamiltonian path problem is NP-complete, this reduction shows that this problem is NP-hard. To show that it is NP-complete, we also have to show that it is in NP. This is easy to see, however, since the certificate for the yes-instance is a description of the path of length k.

Relation to the shortest path problem

The longest path problem can be reduced to the shortest path problem (although the graph may have negative-weight cycles), by exploiting the duality of optimizations (maximizing a positive value is the same as minimizing a negative value). If the input graph to the longest path problem is G, the shortest simple path on the graph H, which is exactly the same as G but with edge weights negated, is the longest simple path on G. However, any positive-weight cycles in the original graph G lead to negative-weight cycles in H. Finding the shortest simple path on a graph with negative-weight cycles is therefore also NP-complete. If G contains no cycles, then H will have no negative-weight cycles, and any shortest-path finding algorithm can now be run on H to solve the original problem in polynomial time. Thus the longest path problem is easy on acyclic graphs.

Algorithms for acyclic graphs

As mentioned above, the longest path problem on acyclic graphs can be solved in polynomial time by negating the edge weights and running a shortest-path finding algorithm. The algorithm described below does not use this reduction and achieves a better time complexity.

Weighted directed acyclic graphs

If G is a directed acyclic graph, the longest path problem on G can be solved in linear time using dynamic programming. Assume that topOrder(G) outputs a sequence of vertices in topological order (This can be computed via a topological sort and this step requires that the input graph is a directed acyclic graph). Furthermore, let V(G) be the set of vertices of G and E(G) be the set of edges in G and, if weights are defined, let weight (Ge) be the weight of an edge e in the graph G (if the graph is unweighted, insert an arbitrary constant other than zero for any occurrence of weight (G,e)). Then the following algorithm computes the length of the longest path:

algorithm dag-longest-path is
    input: 
         Directed acyclic graph G
    output: 
         Length of the longest path

    length_to = array with |V(G)| elements of type int with default value 0

    for each vertex v in topOrder(G) do
        for each edge (v, w) in E(G) do
            if length_to[w] <= length_to[v] + weight(G,(v,w)) then
                length_to[w] = length_to[v] + weight(G, (v,w))
 
    return max(length_to[v] for v in V(G))

Correctness can be checked as follows: The topological order traverses the given graph in layers of ascending distance from the source vertices of the graph (at first, all vertex with distance 0 from the source vertices, then all vertices with distance 1 from the source vertices, ...). For each vertex, it clearly holds that a path of maximum length consists of a path of maximum distance through all the layers which are closer to the source than this vertex and also an edge of maximum length connecting the longest path up to this node with this node. This is exactly length_to(w) = max(weight(G, (vw)) + length_to(v)) for all edges (v,w) in G, just written in prose. Considering that the topological order traverses all possible vertices v on the previous layer before traversing w shows that the implementation computes exactly this maximum for each vertex and thus, the maximum of all lengths to a vertex is the correct length.

The worst-case running time of this algorithm is O(T + |V| + |E| + |V|), if T is the time required by the topological order. Assuming a standard algorithm for computing the topological order, this simplifies into O(|V| + |E| + |V| + |E| + |V|), which simplifies further into O(|V| + |E|).

Furthermore, extending this algorithm to compute the actual path is straightforward. In order to do this, it is necessary to introduce a predecessor (or successor)-array which is updated whenever the length_to is modified. Given this, one can try to reconstruct the path from the sources and sinks of the directed acyclic graph and once there is a path of maximum length reconstructed, this is one of the longest paths.

Related problems

External links